Pan-Cancer Analysis Reveals Long Non-coding RNA (lncRNA) Embryonic Stem Cell-Related Gene (ESRG) as a Promising Diagnostic and Prognostic Biomarker

Background: Embryonic stem cell-related gene (ESRG; also known as HESRG) is a long non-coding RNA (lncRNA). It is involved in the regulation of human pluripotent stem cells (hPSCs) self-renewal. ESRG gene has the ability to interact with chromatins, different RNA types, and RNA binding proteins (RBP); thus making ESRG be considered an oncogenic lncRNA, where its expression is detected in various tumor tissues. This study aimed to evaluate the prospective diagnostic and prognostic values of ESRG in various human cancers. Materials and methods: The expression of ESRG in various cancers was analyzed using the Gene Expression Profiling Interactive Analysis (GEPIA), Tumor Immune Estimation Resource (TIMER), and University of Alabama at Birmingham Cancer Data Analysis Portal (UALCAN) databases. Moreover, the correlation between the expression of ESRG and clinical pathological parameters was analyzed using UALCAN. The effect of ESRG expression on the survival outcome was evaluated using Kaplan-Meier plotter, UALCAN, GEPIA, and TIMER. The correlation between ESRG expression and immune cell infiltration was studied by TIMER. Additionally, the genetic alterations were investigated cBioportal. Our findings were validated using the GEO2R database. Results: Our results showed ESRG to be significantly up-regulated in colon adenocarcinoma (COAD) and lung squamous cell carcinoma (LUSC) with p<0.001, in addition to rectum adenocarcinoma (READ), and uterine carcinosarcoma (UCEC) with p<0.01. Regarding pathogenic stages, there was a significant upregulation in stages 2, 3, and 4 compared to normal in COAD and stages 1, 2, and 3 for LUSC patients. The combined prognostic analysis showed that the up-regulated expression of ESRG was associated with better survival outcomes in patients with brain lower-grade glioma (LGG). Our results demonstrate a significant negative correlation between ESRG expression and the abundance of CD8+T cells in COAD, READ, LUSC, and UCEC. Additionally, ESRG was mutated in 0.77 (<1%) of the queried samples, and the most prevalent ESRG mutations are deep deletion mutations, followed by amplification. Conclusion: Analysis of ESRG across various cancer types elucidated its potential to be used as a diagnostic biomarker in COAD, LUSC, READ, and UCEC and a promising prognostic biomarker in LGG. Our findings provide useful insights for future research.


Introduction
Embryonic stem cell-related gene (ESRG; also known as HESRG) is a long non-coding RNA (lncRNA) located at chromosome 3p14.3with a full-length mRNA of 3151 nucleotides consisting of four exons and three introns, found in the nuclei of human embryonic stem cells (hESCs) [1,2].As an lncRNA, ESRG contains an open reading frame that encodes small peptides, through which it regulates human pluripotent stem cells' (hPSCs) self-renewal ability; as a part of a transcriptional hierarchy in cooperation with many other genes, and is considered to be indispensable for cell survival and self-renewal/pluripotency of hPSCs [3][4][5][6].
The level of ESRG expression is not the same in all cells; earlier studies found ESRG to only be exclusively expressed in undifferentiated hESCs, where the expression levels decrease or diminish after differentiation, in addition to being scarcely detected or absent in most adult tissues [1].Conversely, more recent studies showed that ESRG is expressed in adult tissues like ovary tissues and fibroblasts, however with lower levels than that in pluripotent cells [2,7].
Over the past decades, significant progress has been made in unraveling the molecular mechanisms underlying cancer development and progression.One area of research that has gained attention is the relationship of lncRNA deregulation with cancers and cancer metastasis, this deregulation was found to be related to treatment resistance and poor prognosis in cancers [8,9].Additionally, previous studies showed that lncRNA expression was proportional to antisense coding gene expression, which is associated with cancers and many other diseases [10].ESRG gene was proposed to have tumor suppressive effect; relating to its interaction with chromatins, different RNA types, and RNA binding proteins (RBP), which are viewed as critical elements in posttranscriptional gene regulation [11].Hence, ESRG is considered an interesting gene to study; given that those interactions dictate cell behavior and subsequently the susceptibility to turn cancerogenic, making ESRG to be considered as an oncogenic lncRNA [12][13][14].Previous studies have shown that lncRNAs were expressed in various tumor tissues such as breast cancer, thyroid cancer, colorectal cancer, and gastrointestinal cancer, with the potential to be used as prognostic or diagnostic biomarkers [8,13,15,16].However, the potential of ESRG as a biomarker has been stated by just one study concerning intracranial germinoma and embryonal carcinoma [17].In this study, we have conducted a comprehensive pan-cancer analysis of ESRG expression including clinicopathological correlation, immune infiltration, and genetic alterations to determine the diagnostic and prognostic value of ESRG using various databases.

ESRG expression analysis
The Tumor Immune Estimation Resource (TIMER) 2.0 database (http://timer.cistrome.org/) is an online platform that was used to estimate ESRG differential expression between tumor and normal tissues from The Cancer Genome Atlas (TCGA) database in the "Gene_DE" module [18].The Gene Expression Profiling Interactive Analysis (GEPIA) (http://gepia.cancer-pku.cn/)database (accessed in 2024) is an online tool used to indicate gene expression from 9736 tumor samples and 8587 normal samples from TCGA and GTEx.ESRG expression was estimated across various cancers using a cutoff of 0.05 for the p-value and 1.5 for the log2FC [19].Furthermore, the University of Alabama at Birmingham Cancer Data Analysis Portal (UALCAN) database (https://ualcan.path.uab.edu/) is an online resource for analyzing and exploring cancer data from the TCGA database was used to analyze the significance of ESRG differential expression.Moreover, this database was used to investigate the correlation between ESRG expression and clinicopathological parameters analysis including, stage, race, gender, weight, and age [20].

Survival outcome analysis of HESRG across various cancer types
The Kaplan Meier plotter (https://kmplot.com/analysis) is capable of assessing the correlation between the expression of all genes and survival in 35k+ samples from 21 tumor types [21].The prognostic potential of the ESRG was assessed using this database.Hazard ratios and p-values or log-rank p-values were used for exploring the significance of overall survival (OS), and relapse-free survival (RFS).Moreover, the UALCAN database provides graphs and plots depicting patient survival information for lncRNA-coding genes [20].Additionally, GEPIA and TIMER databases were used for the same purpose [19].

Immune infiltrates analysis of HESRG across various cancer types
The gene module of TIMER2.0 was used to visualize the correlation of HESRG expression with six immune infiltrates (B cells, CD4+ T cells, CD8+ T cells, neutrophils, macrophages, and dendritic cells) across colon adenocarcinoma (COAD), lung squamous cell carcinoma (LUSC), rectum adenocarcinoma (READ), uterine carcinosarcoma (UCEC), and lower-grade glioma (LGG).Then Cox proportional hazard model of the TIMER database (survival module) was used to explore the association between clinical factors (age and stage) and abundance of six immune infiltrates, and gene expression [18].

Genetic alterations analysis using the cBioPortal platform
The cBio Cancer Genomics Portal (https://www.cbioportal.org/) is an open-access tool for investigating and exhibiting genetic variations using cancer genomic datasets [22].We applied it to identify the genetic alterations of ESRG, currently offering the data from 10,967 tumor samples over 32 cancer studies, particularly TCGA Pan-Cancer Atlas Studies.

Validation of ESRG expression
We used publicly available datasets from the National Centre for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/) to verify our findings.Using the GEO2R program (https://www.ncbi.nlm.nih.gov/geo/geo2r/)[23], an interactive online tool that lets researchers compare at least two groups of samples to find genes that are expressed differently, we carried out differential expression analysis.As a result, we were able to determine the importance of ESRG in COAD, LUSC, READ, UCEC, and LGG.The profile of differentially expressed genes (DEGs) was visualized using volcano plots from the (http://www.bioinformatics.com/.cn)[24] platform, which is an online tool utilized for data visualization and analysis.

ESRG expression is correlated with clinicopathological parameters
The expression of ESRG was further investigated, by evaluating the association between ESRG expression and clinicopathological parameters including stage, age, race, gender, and weight in COAD, LUSC, READ, and UCEC using UALCAN.

FIGURE 5: The correlation between ESRG expression and OS in different cancers using Kaplan-Meier plotter
The red line represents high gene expression and the black line represents low gene expression.p < 0.05.

The correlation between ESRG expression and immune infiltration
Using the TIMER2.0database, we analyze tumor-infiltrating immune cells over 10,000 RNA seq.samples across 23 cancer types from TCGA.The correlation between ESRG expression and the abundance of immune cells was investigated in COAD, LUSC, READ, UCEC, and LGG.

Genetic alterations analysis using cBioPortal platform
The genetic alterations in ESRG were analyzed using the cBioPortal platform for cancer genomics using TCGA datasets, we found that ESRG was mutated in 0.77 (<1%) of the queried samples (10,967 samples from 32 studies).The most prevalent ESRG mutations are deep deletion mutations, followed by amplification.We noticed that the majority of ESRG mutations occurred in esophageal adenocarcinoma with deep deletion mutation frequency = 3.3% (six cases) (Figure 9A).Furthermore, we observed that ESRG expression was not mutated in the vast majority of samples, also there was no variation in survival between patients with mutated ESRG and patients with non-mutated ESRG (p= 0.46) as manifested in (Figure 9B).
For the GSE87211 dataset, the normal group encompassed 160 control mucosa and 203 rectal tumor samples, whereas GSE149507 contained 18 normal lung and 18 small cell lung cancer samples, while nine normal brain and 33 brain tumor samples were included in GSE35493.For GSE63514, 24 normal cervical epithelium and 28 cervical squamous epithelial cancer samples were included.

Discussion
The paper investigates the role of ESRG, a lncRNA in various cancers.We employed multiple bioinformatics tools and databases to analyze ESRG expression levels, their correlation with clinicopathological parameters and immune cell infiltrations, prognostic value, and genetic alterations across different cancer types.
We employed GEPIA, TIMER, and UALCAN databases to identify any notable variations of ESRG expression across a range of malignancies considering the findings that were consistent between TIMER and UALCAN databases.Our findings revealed significant ESRG upregulation in COAD, LUSC, READ, and UCEC; hence, it can be used as a potential diagnostic biomarker to distinguish normal from tumor samples.The significant upregulation in COAD and READ aligns with the findings of another study which revealed that ESRG showed an aberrant upregulation in colorectal cancer using quantitative polymerase chain reaction (qPCR) and explored a negative correlation with overall survival [11].Regarding LUSC, there is a study establishing the association between ESRG overexpression and resistance to chemotherapy implying the gene regulatory networks (GRNs) [25]; this is due to the fact that ESRG as lncRNAs have been proven to contribute to anticancer therapy resistance [26], and the existence of the ESRG in the GRN emphasizes the presence of cancer stem cells in the cancer population, which are known to induce resistance to chemotherapy [27].To our knowledge, there was no published data to demonstrate the relationship between ESRG expression and UCEC.
Based on our findings, the significant upregulation of ESRG in these cancers COAD, LUSC, READ, and UCEC implies its role in carcinogenesis.However, the association between ESRG expression and cancer is still being studied.The expression of ESRG in the cancer cell population might indicate the presence of cancer stem cell potentials [27]; this has been further justified by its critical role in sustaining pluripotency and self-renewal capacity in hPSCs through numerous mechanisms [1].One study stated that ESRG acts as a novel octamer-binding factor transcription factor 4 OCT4 target that works with minichromosome maintenance protein 2 (MCM2) to decrease tumor protein p53 signaling [6].Another study reported that ESRG is bound to and stabilizes heterogeneous nuclear ribonucleoprotein A1 (HNRNPA1) using the ubiquitin-proteasome system [28].Furthermore, findings suggested that, by its interaction with cytochrome c oxidase subunits II COXII, ESRG may be crucial in controlling the apoptosis of hESC and thus significantly contributing to the preservation of hESC properties [29].However, a previous study found that cells retained their regeneration and self-renewal abilities despite the knock-out of the ESRG gene, and they suggest that the role of ESRG can be restrained to being a biomarker for pluripotency [30].
Furthermore, the UALCAN database was used to examine the correlation between ESRG expression and clinicopathological parameters.Our results highlight significant variations in the terms of age for UCEC and for both age and gender for COAD, LUSC, and READ.Regarding race, ESRG was significantly upregulated in Caucasians in both COAD and LUSC, in African Americans in COAD and UCEC, and also in Africans in LUSC.
Considering weight ESRG is being upregulated significantly just in COAD.Moreover, our findings figured out relatively significant variations in ESRG expression between normal and different cancer stages in COAD, LUSC, and READ.However, in UCEC, ESRG was significantly expressed only in stage 1.As far as we know this study is the first to demonstrate the significant association between ESRG expression and clinicopathological parameters.However, these significant variations within cancer stages may explore the potential role of ESRG in tumor progression which needs to be further studied.
The prognostic analysis was done using three databases: TIMER, UALCAN, and GEPIA to investigate the correlation between ESRG expression and survival, considering the results that are in line with the three databases: TIMER, UALCAN, and GEPIA.It was discovered that the higher expression level of the ESRG gene was associated with a good prognosis in LGG.So these findings indicate a significant interaction between ESRG expression and patient survival, and its potential to be used as a prognostic biomarker in LGG.
Comparing our study findings to existing literature there is one study that examined the expression of ESRG in various intracranial malignancies and stated that ESRG was only expressed in embryonal carcinoma and germinoma but barely in the other forms of brain tumors and concluded that the ESRG gene as a sensitive biomarker for these tumors [17].
On top of that, the study demonstrates the association between ESRG expression and immune cell infiltrations across various cancers including COAD, LUSC, READ, UCEC, and LGG.Our findings indicate weak negative correlations between ESRG expression and the abundance of CD8+ T cells in COAD, LUSC, READ, and LGG, also with CD4+ T cells, macrophages, neutrophils, and dendritic cells in LUSC and LGG.Additionally, with B cells in LGG, this significant correlation implies the potential immunosuppressive effects of ESRG, contributing to immune evasion and tumor progression.However, the precise mechanisms underlying this association require further investigation.
Additionally, genetic alterations analysis using the cBioPortal platform revealed that ESRG mutations are rare across different cancer types, with deep deletion mutations being the most prevalent.However, the impact of ESRG mutations on patient survival appears to be minimal, suggesting that other factors may predominantly influence ESRG-mediated carcinogenesis.
Our research is the first pan-cancer analysis of ESRG.It provides a comprehensive analysis to elucidate the correlation between ESRG expression and its role in cancer development and progression across various types of cancers.Our results provide a foundation for exploring the association between ESRG expression and the abundance of immune cells considering their complex interaction with patients' survival in different cancer types to be further investigated.These results can be the base of using ESRG expression as a biomarker that can be used in diagnosis and monitoring of the cancers discussed in the study.
The study has several limitations; firstly, the analysis depends on bioinformatics tools and publicly accessible datasets, so experimental wet lab analysis is required to verify our findings.Additionally, the functional roles of ESRG in immune regulation and cancer progression remain incompletely understood and need further investigation.Moreover, the study primarily focuses on ESRG expression levels and their association with clinical outcomes in different cancers, ignoring the potential regulatory mechanisms and interactions with other molecular pathways.Finally, there are a limited number of datasets for uterine carcinoma and our gene was not present in differentially expressed genes obtained from GEO2R analysis, so we used cervical carcinoma datasets to validate our findings of uterine carcinoma.

Conclusions
In conclusion, our comprehensive pan-cancer analysis of ESRG across various cancer types demonstrated its potential to be used as a diagnostic biomarker in COAD, LUSC, READ, and UCEC and a promising prognostic biomarker in LGG.Furthermore, our findings figured out relatively significant variations in ESRG expression between normal and different cancer stages in COAD, LUSC, and READ.However, in UCEC, ESRG was significantly expressed only in stage 1.Moreover, our results demonstrate a significant negative correlation between ESRG expression and the abundance of CD8+ T cells in COAD, READ, LUSC, and UCEC.Additionally, ESRG was mutated in 0.77 (<1%) of the queried samples, and the most prevalent ESRG mutations are deep deletion mutations, followed by amplification.

FIGURE 1 :
FIGURE 1: ESRG expression analysis in various tumors using TIMER2.0database The red columns represent the tumor tissues and the blue ones represent the normal tissues while the stars indicate the differential significance between the tumor and normal samples.*p < 0.05, **p < 0.01 and ***p < 0.001.

FIGURE 3 :
FIGURE 3: Correlation between HESRG gene expression analysis with clinicopathological features (stage, gender, race, age, and weight) in COAD and READ using UALCAN (A) Expression in COAD based on stage; (B) Expression in COAD based on race; (C) Expression in COAD based on gender; (D) Expression in COAD based on age; (E) Expression in COAD based on weight; (F) Expression in READ based on stage; Expression in READ based on age; (G) Expression in READ based on gender.*p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001 and *****p < 0.00001.HESRG: embryonic stem cell-related gene; UALCAN: University of Alabama at Birmingham Cancer Data Analysis Portal; p: p-value; COAD: Colon adenocarcinoma; READ: Rectum adenocarcinoma; n: number of samples

FIGURE 6 :
FIGURE 6: The correlation between ESRG expression and survival outcome (A) The correlation between ESRG expression and survival outcome in SKCM and COAD using UALCAN.(B) The correlation between ESRG expression and survival outcome in LGG using GEPIA and UALCAN.HR: hazard ratio; logrank P: p-value resulting from logrank test; p: p-value, HESRG/ESRG: embryonic stem cellrelated gene; COAD: colon adenocarcinoma; SKCM: skin cutaneous melanoma gene; LGG: brain lower grade glioma; TPM: transcripts per million

FIGURE 8 :
FIGURE 8: Kaplan-Meier plots exhibited the correlation between ESRG expression, immune cell infiltrates (B cells, CD4+ T Cells, CD8+ T cells, macrophages, neutrophils, and dendritic cells), and the survival outcome of patients with cancer using the TIMER database.

FIGURE 9 :
FIGURE 9: Genetic alterations analysis of ESRG using cBioPortal platform (A) Genetic alterations frequency of ESRG in various cancers.(B) The correlation between the survival outcome of patients with different cancer types and ESRG genetic alterations.ESRG: embryonic stem cell-related gene; TCGA: The Cancer Genome Atlas; CNA: Copy number alterations; logrank P: p-value resulting from log rank test

TABLE 6 :
Number of differentially expressed genes (upregulated and downregulated genes) in four GEO datasets GEO: Gene Expression Omnibus; DEGs: differentially expressed genes

FIGURE 11 :
FIGURE 11: GEPIA expression analysis of HESRG between TCGA tumor tissues and GTEx database normal tissue *p < 0.05 HESRG: embryonic stem cell-related gene; GEPIA: Gene Expression Profiling Interactive Analysis; TCGA: The Cancer Genome Atlas; p: p-value; TGCT: testicular germ cell tumors; num(T): the number of tumor samples; num(N): the number of normal samples